Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798
Open
travispchen wants to merge 1 commit intoopenai:mainfrom
Open
Record: Order-Adaptive Entropy Gating + BackoffNgramMixer (val_bpb=0.5466)#798travispchen wants to merge 1 commit intoopenai:mainfrom
travispchen wants to merge 1 commit intoopenai:mainfrom
Conversation
…5466, 3-seed mean) Adds order-adaptive entropy gating on top of PR openai#779's BackoffNgramMixer + Drift-Free TTT. Per-order entropy centers replace single threshold: higher n-gram orders trusted at lower entropy. 3-seed validation: 0.5478, 0.5458, 0.5463 (mean 0.5466, std 0.0010). All artifacts strictly under 16,000,000 bytes. Co-Authored-By: Travis Chen <travispchen@gmail.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
Add per-order entropy centers from PR openai#798 insight: order 7: center=3.0, order 6: 3.2, order 5: 3.5, order 4: 3.8, order 3: 4.2, order 2: 4.5 Higher orders trusted at lower entropy, lower orders only at high uncertainty. Cubric multipliers applied on top. Original X-WING (0.5644) untouched in concepts/xwing/. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
newjordan
pushed a commit
to newjordan/parameter-golf-1
that referenced
this pull request
Mar 26, 2026
PR openai#798's approach on our engine: per-order entropy centers (7:3.0, 6:3.2, 5:3.5, 4:3.8, 3:4.2, 2:4.5) without cubric. Testing if cubric was hurting when combined with per-order gating. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
10 tasks
10 tasks
6 tasks
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Order-Adaptive Entropy Gating + BackoffNgramMixer + Drift-Free TTT
val_bpb: 0.5466 (3-seed mean, std 0.0010) | ~15.99 MB | 8×H100 SXM
Adds order-adaptive entropy gating on top of PR #779's BackoffNgramMixer + Drift-Free TTT submission. Instead of using a single entropy center for all n-gram orders, each order gets its own threshold — higher orders are trusted at lower entropy, lower orders only kick in when the model is more uncertain.
Results (8×H100 80GB SXM, PyTorch 2.9.1+cu128)
What Changed vs PR #779
PR #779 uses a single
entropy_center=3.5for all n-gram orders. We replace this with per-order entropy centers:Higher-order n-grams (7, 6, 5) are trusted at lower model entropy — when the model is fairly confident, the precise n-gram correction refines the prediction. Lower-order n-grams (4, 3, 2) only intervene at higher entropy — when the model is confused enough that even coarse statistics help.
This is an eval-time-only change. It modifies how existing n-gram statistics are combined with neural predictions, not when data enters the cache. The n-gram cache is still updated strictly AFTER scoring each batch (score-first).
Legality
Ablation
Credits